A system for automatic broadcast news summarisation, geolocation and translation
نویسندگان
چکیده
An increasing amount of news content is produced in audiovideo form every day. To effectively analyse and monitoring this multilingual data stream, we require methods to extract and present audio content in accessible ways. In this paper, we describe an end-to-end system for processing and browsing audio news data. This fully automated system brings together our recent research on audio scene analysis, speech recognition, summarisation, named entity detection, geolocation, and machine translation. The graphical interface allows users to visualise the distribution of news content by entity names and story location. Browsing of news events is facilitated through extractive summaries and the ability to view transcripts in multiple languages.
منابع مشابه
Are Extractive Text Summarisation Techniques Portable to Broadcast News?
In this paper we report on a series of experiments which compare the effect of individual features on both text and speech summarisation, the effect of basing the speech summaries on automatic speech recognition transcripts with varying word error rates, and the effect of summarisation approach and transcript source on summary quality. We show that classical text summarisation features (based o...
متن کاملAspects of Multilingual News Summarisation
In this book chapter, we discuss several pertinent aspects of an automatic system that generates summaries in multiple languages for sets of topic-related news articles (multilingual multi-document summarisation), gathered by news aggregation systems. The discussion follows a framework based on Latent Semantic Analysis (LSA) because LSA was shown to be a high-performing method across many diffe...
متن کاملFrom Text Summarisation to Style-Specific Summarisation for Broadcast News
In this paper we report on a series of experiments investigating the path from text summarisation to style-specific summarisation of spoken news stories. We show that the portability of traditional text summarisation features to broadcast news is dependent on the diffusiveness of the information in the broadcast news story. An analysis of two categories of news stories (containing only read spe...
متن کاملMulti-stage compaction approach to broadcast news summarisation
This paper presents a fully automatic, multi-stage compaction approach to broadcast news summarisation, targeting transcripts from automatic speech recognition (ASR) systems. It employs a network of multi-layer perceptrons to remove incorrectly transcribed words based on confidence scores, and to select significant chunks at multiple stages based on tf.idf scores and named entity frequency. The...
متن کاملStatistical Machine Translation of Broadcast News from Spanish to Portuguese
In this paper we describe the work carried out to develop an automatic system for translation of broadcast news from Spanish to Portuguese. Two challenging topics of speech and language processing were involved: Automatic Speech Recognition (ASR) of the Spanish News and Statistical Machine Translation (SMT) of the results to the Portuguese language. ASR of broadcast news is based on the AUDIMUS...
متن کامل